Efficient Lineage for SUM Aggregate Queries
نویسندگان
چکیده
AI systems typically make decisions and find patterns in data based on the computation of aggregate and specifically sum functions, expressed as queries, on data’s attributes. This computation can become costly or even inefficient when these queries concern the whole or big parts of the data and especially when we are dealing with big data. New types of intelligent analytics require also the explanation of why something happened. In this paper we present a randomised algorithm that constructs a small summary of the data, called Aggregate Lineage, which can approximate well and explain all sums with large values in time that depends only on its size. The size of Aggregate Lineage is practically independent on the size of the original data. Our algorithm does not assume any knowledge on the set of sum queries to be approximated.
منابع مشابه
ارائه روشی پویا جهت پاسخ به پرسوجوهای پیوسته تجمّعی اقتضایی
Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...
متن کاملEfficient SUM Query Processing over Uncertain Data
SUM queries are crucial for many applications that need to deal with probabilistic data. In this paper, we are interested in the queries, called ALL_SUM, that return all possible sum values and their probabilities. In general, there is no efficient solution for the problem of evaluating ALL_SUM queries. But, for many practical applications, where aggregate values are small integers or real numb...
متن کاملSUM Query Processing over Probabilistic Data
SUM queries are crucial for many applications that need to deal with probabilistic data. In this report, we are interested in the queries, called ALL_SUM, that return all possible sum values and their probabilities. In general, there is no efficient solution for the problem of evaluating ALL_SUM queries. But, for many practical applications, where aggregate values are small integers or real num...
متن کاملEfficient Evaluation of HAVING Queries on a Probabilistic Database
We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING queries in SQL) on probabilistic databases. Our motivation is to handle aggregate queries over imprecise data resulting from information integration or information extraction. More precisely, we study conjunctive queries with predicate aggregates using MIN, MAX, COUNT, SUM, AVG or COUNT(DISTI...
متن کاملEfficient external memory structures for range-aggregate queries
We present external memory data structures for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in R, compute the aggregate of the weights of the points that lie inside a d-dimensional orthogonal query rectangle. The aggregates we consider in this paper include COUNT, SUM, and MAX. First, we develop a structure for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- AI Commun.
دوره 28 شماره
صفحات -
تاریخ انتشار 2015